AutoASC - A System for Automatic Acquisition of Sense Tagged Corpora
نویسندگان
چکیده
Many natural language processing tasks, such as word sense disambiguation, knowledge acquisition, information retrieval, use semantically tagged corpora. Till recently, these corpus-based systems relied on text manually annotated with semantic tags; but the massive human intervention in this process has become a serious impediment in building robust systems. In this paper, we present AutoASC, a system which automatically acquires sense tagged corpora. It is based on (1) the information provided in WordNet, particularly the word definitions found within the glosses and (2) the information gathered from Internet using existing search engines. The system was tested on a set of 46 concepts, for which 2071 example sentences have been acquired; for these, a precision of 87% was observed.
منابع مشابه
Automatic Acquisition of Sense Tagged Corpora
An important problem in Natural Language Processing is identifying thecorrect sense of a word in a particular context. Thus far, statistical methods have been considered the best techniques in word sense disambiguation. Unfortunately, these methods produce high accuracy results only for a small number of preselected words. The reduced applicability of statistical methods is due basically to the...
متن کاملTowards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
Sense tagged corpora play a crucial role in Natural Language Processing, particularly in Word Sense Disambiguation and Natural Language Understanding. Since semantic annotations are usually performed by humans, such corpora are limited to a handful of tagged texts and are not available for many languages with scarce resources including Persian. The shortage of efficient, reliable linguistic res...
متن کاملAutomatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS
Methods. For a term W that represents multiple UMLS concepts, a collection of MEDLINE abstracts that contain W is extracted. For each abstract in the collection, occurrences of concepts that have relations with W as defined in the UMLS are automatically identified. A sense-tagged corpus, in which senses of W are annotated, is then derived based on those identified concepts. The method was evalu...
متن کاملEmpirical Acquisition Of Differentiating Relations From Definitions
This paper describes a new automatic approach for extracting conceptual distinctions from dictionary definitions. A broad-coverage dependency parser is first used to extract the lexical relations from the definitions. Then the relations are disambiguated using associations learned from tagged corpora. This contrasts with earlier approaches using manually developed rules for disambiguation.
متن کاملUnsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias
This paper explores the large-scale acquisition of sense-tagged examples for Word Sense Disambiguation (WSD). We have applied the “WordNet monosemous relatives” method to construct automatically a web corpus that we have used to train disambiguation systems. The corpus-building process has highlighted important factors, such as the distribution of senses (bias). The corpus has been used to trai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJPRAI
دوره 14 شماره
صفحات -
تاریخ انتشار 2000